Cross-language analysis of world regions in the press

An empirical approach based on Wikidata

Claude Grasland & Etienne Toureille

19/10/2021

1. INTRODUCTION

Previous analysis on german (left) and french (right) newspapers has demonstrated the interest to analyse networks of states and world regions :

But before to validate this results we need :

  1. to clarify our definition of world regions and the associated list of target units.
  2. to enlarge the dictionary to other languages (turkish, arabic, english)

Objectives

The objective of this short note is to explore the possibility of Wikidata for the production of multilingual dictionaries of world regions and more generally regional imaginations. Different types of “regions” related to the division of the Earth (“natural”) or the division of the World (“political”)

But the difference is not clear : see. Grataloup (2011), Lewis and Wigen (2019), Copeaux (1997), Brennetot and Rosemberg (2013)

Earth/Natural regions: Atlas

So-called “Physical maps” in Atlas are a good source :

World/Political regions : IGO

Source : https://commons.wikimedia.org/wiki/Atlas_of_international_organizations

World/Political regions : Other …

A cross-language perspective

We propose to etablish a dictionary of Earth and World Regions in the five languages of interest for the project IMAGEUN :

We want to avoid any “eurocentric” or “anglocentric” perspective in the definition of entities. Therefore our definition of entities will follow the following rules :

  1. Non universal : Entities will not necessary be available in all languages
  2. Non equivalent : Translation of names does not imply equivalence of entities
  3. Non hierarchic : An entity has different definitions in each language. None of the language can be considered as “pivot” or “reference.”

Entities equivalences and lexical universes

To summarize, we propose to build partial equivalences between entities that belong to different lexical universes.

The comparison between lexical universes will be necessarily limited to a small sample of entities for which we can assume that the entities are approximately equivalent.

2. WIKIDATA

Wikidata defines itself as

Codification of entities

The first interest of wikidata is to provide unique code of identifications of objects. For example a research about “Africa” will produce a list of different objects characterized by a unique code :

Informations on entities

Once we have selected an entity (e.g. Q15) we obtain a new page with more detailed informations in english but also in all other languages available in Wikipedia.

Informations on entities

A lot of information are available concerning the entity but, at this stage, the most important ones for our research are :

  1. the translation in different languages
  2. the equivalent words or expression in different languages
  3. the definitions in different languages
  4. The existence of an ambiguity of meaning, for example homonymies..

Wikidata data allows to formalize a procedure to build dictionaries and to objectify entity and translation choices between expert coders (us). It should lead to the construction of “specialized” dictionaries for the analysis of geographical entity, through discussion between the native speakers of the different languages in the project.

Multilanguage defintions

The specificity of the wikidata ontology is the fact that it is a multilinligual web where Q15 is a node of the web present in different linguistic layers. It means that we don’t have a single name or a single definition of Q15, except if we choose the english language as reference. Depending on the context (i.e. the language or sub-language), Q15 could be defined as :

language definition
fr A continent named Afrique
en A continent on the Earth’s northern and southern hemispheres named Africa or African continent
de A “Kontinent auf der Nord- und Südhalbkugel der Erde” named “Afrika”
tr A “Dünya nın kuzey ve güney yarıkürelerindeki bir kıta” named “Afrika” or “Afrika kıtası”
ar The second largest continent in the world in terms of area and population, comes second only to Asia (trad.)

Correspondance between entities ?

The existence of the same code of wikipedia entities does not offer any guarantee of concordance between the geographical objects found in news published in different languages or different countries. But - and it is the important point - it help us to point similarities and differences between set of geographical entities that are more or less comparable in each language.

Cross-language perspective

Having in mind the limits of the equivalence of entities across languages, it can nevertheless be an interesting experience to select a set of wikipedia entities (Q15, Q258, Q4412 …) and to examine their relative frequency in our different media from different countries with different languages. A typical hypothesis could be something like :

which is not equivalent to the question

but rather equivalent to the two joint questions

Workflow in a nutschell

We propose a semi-automatic method of extractions of entities in different languages that implies the presence of human expert at each step of the analysis. The figure below describe an example of research of world regions related to Africa in three languages.

The programs used for computer implementation are explained in the media cookbook on github with an example of implementation available onf the following page

3.EXPERIMENTS

We have realized a test of the previous workflow on an arbitraty selection of 110 entities :

  1. 65 entities related to continent and “natural” Earth divisions :
  1. 45 regional organizations mentionned by Wikimedia : NATO, EU, CEI, NAFTA, …

Warning : This analysis does not offer perfect guarantee of quality because :

  1. The list of entities has not been discussed by the IMAGEUN’s partners
  2. The dictionary established in the different languages has not been controled by native speakers

The purpose is therefore only to provide food for thought.

Data

We start from a corpus of text where target wikipedia entities has been recognized :

text source date regs nbregs
Europa und Südamerika: EU und Mercosur beschließen weltweit größte Freihandelszone de_DEU_suddeu 2019-06-28 CO_EUR CO_AMR_south OR_EU OR_Merco 4
Asie, Afrique, Europe: la nouvelle stratégie de l’État islamique fr_FRA_figaro 2019-05-03 CO_ASI CO_AFR CO_EUR 3
Présidentielle américaine: Europe, Asie, Otan. Le monde retient son souffle fr_FRA_figaro 2020-11-02 CO_EUR CO_ASI OR_NATO 3
‘Rolling emergency’ of locust swarms decimating Africa, Asia and Middle East en_GBR_guardi 2020-06-08 CO_AFR CO_ASI LA_east_middle 3

Experience 1 : An Inter-Language analysis of lexical universes

Experience 1 : Europe/EU (Q46 / Q458)

Experience 1 : Mediterranea (Q4918)

Experience 2 : A Cross-Language analysis of regional entities

Experience 2 : Data aggregation

For the experience 2, we create a new object called hypercube where the text of news has been removed and where we keep only the number of tags or proportion of news speaking from one or several regions (where1, where2), by media (who) and by time period (when)

## Joining, by = "id"

Experience 2 : Top 20 regions in full corpus

We can propose firstly a table of top entities in the whole corpus of newspapers.

code type label nb
1 OR_EU org European Union 7914
2 CO_EUR cont Europe 5518
3 CO_AFR cont Africa 1822
4 SE_medit sea Mediterranean Sea 979
5 OR_NATO org NATO 697
6 CO_ASI_minor cont Asia Minor 499
7 SE_black sea Black Sea 480
8 LA_east_middle land Middle East 382
9 CO_ASI cont Asia 372
10 CO_AMR cont Americas 346
11 LA_sahel land Sahel 282
12 LA_alpen land Alps 253
13 SE_arcti sea Arctic 200
14 LA_mashr land Maghreb 160
15 SE_pacif sea Pacific Ocean 159
16 CO_AMR_latin cont Latin America 152
17 LA_sahara land Sahara 146
18 SE_atlan sea Atlantic Ocean 142
19 CO_AFR_south cont Southern Africa 130
20 LA_amazon land Amazonia 130

Experience 2 : Turkish newspapers - Top 10 regions

tab1 Cumhuryet_Region Cumhuryet pct Yeni Savak_Region Yeni Savak pct
1 Europe 30.4 Europe 25.1
2 European Union 20.8 European Union 20.8
3 Asia Minor 11.9 Mediterranean Sea 13.4
4 Mediterranean Sea 9.5 Black Sea 10.3
5 NATO 7.5 NATO 8.6
6 Black Sea 7.0 Asia Minor 7.1
7 Africa 2.4 Africa 4.4
8 Asia 1.8 Asia 1.6
9 Eurasia 1.3 Eurasia 1.3
10 Southern Africa 1.2 Antarctica 0.8

Experience 2 : German newspapers - Top 10 regions

tab1 FAZ_Region FAZ pct Süd. Zeit._Region Süd. Zeit. pct
1 European Union 48.7 European Union 53.3
2 Europe 24.9 Europe 20.7
3 Africa 3.4 Middle East 4.3
4 Americas 3.3 Africa 3.6
5 Southern Africa 2.5 Mediterranean Sea 3.0
6 Mediterranean Sea 2.0 Alps 2.0
7 Asia 1.7 Southern Africa 1.4
8 Middle East 1.5 Near East 1.0
9 Alps 1.4 South America 1.0
10 Eastern Europe 0.8 Asia 0.8

Experience 2 : French newspapers - Top 10 regions

tab1 Figaro_Region Figaro pct Le Monde_Region Le Monde pct
1 Europe 28.7 Europe 29.3
2 European Union 28.2 European Union 20.7
3 Americas 4.3 Africa 12.3
4 Mediterranean Sea 4.0 Sahel 4.1
5 Africa 3.5 Mediterranean Sea 3.4
6 Alps 3.1 Alps 3.1
7 NATO 2.7 Americas 2.6
8 Amazonia 2.7 NATO 2.0
9 Sahel 2.4 Amazonia 1.8
10 Polynesia 1.7 Near East 1.8

Experience 2 : UK newspapers - Top 10 regions

tab1 Guardian_Region Guardian pct Daily Telegraph_Region Daily Telegraph pct
1 European Union 39.2 European Union 49.3
2 Europe 22.8 Europe 25.9
3 Africa 5.7 Africa 7.2
4 Arctic 3.9 Asia 2.6
5 Middle East 3.8 Middle East 1.4
6 Pacific Ocean 3.4 Arctic 1.2
7 NATO 2.5 Pacific Ocean 1.1
8 Americas 2.2 Commonwealth of Nations 1.0
9 Atlantic Ocean 1.9 NATO 1.0
10 Latin America 1.8 Caribbean 1.0

Experience 2 : Irish newspapers - Top 10 regions

tab1 Irish Times_Region Irish Times pct Belfast Telegraph_Region Belfast Telegraph pct
1 European Union 64.5 European Union 63.0
2 Europe 20.0 Europe 19.4
3 Africa 1.9 Africa 2.3
4 NATO 1.5 Commonwealth of Nations 1.9
5 Asia 1.3 Arctic 1.9
6 Atlantic Ocean 1.2 NATO 1.6
7 Middle East 1.2 Middle East 1.5
8 Pacific Ocean 0.9 Asia 1.4
9 Maghreb 0.6 Caribbean 1.1
10 Americas 0.6 Atlantic Ocean 0.9

Experience 2 : Tunisian newspapers (top5)

tab1 Babnet (ar)_Region Babnet (ar) pct Econ. Mag_Region Econ. Mag pct La Presse_Region La Presse pct Réalités_Region Réalités pct
1.0 Africa 32.5 Africa 32.1 Africa 35.2 Africa 38.0
2.0 European Union 23.9 European Union 26.4 Mediterranean Sea 17.2 European Union 18.7
3.0 Europe 15.8 Europe 9.7 European Union 12.8 Europe 13.3
4.5 Maghreb 4.8 Maghreb 8.0 Sahel 9.0 Sahel 5.3
4.5 Sahel 4.8 Mediterranean Sea 7.2 Europe 6.6 Maghreb 5.1

Algerian newspapers

tab1 Al Nahar (ar) pct1 El Kahbar (ar) pct2 El Watan (fr) pct3
1 Africa 40.0 Africa 55.7 Sahara 25.6
2 Europe 27.5 Europe 19.5 Africa 19.0
3 European Union 11.7 European Union 7.6 European Union 11.7
4 Sahel 4.2 Mediterranean Sea 3.6 Sahel 9.5
5 Asia 3.9 Asia 3.2 Europe 8.9
6 Middle East 2.5 Middle East 2.6 Maghreb 7.9
6 Arab League 2.5 Maghreb 1.8 Near East 2.5
8 Mediterranean Sea 2.0 Sahel 1.8 North Africa 2.2
10 Maghreb 1.2 Arab League 1.2 West Africa 1.9
10 NATO 1.2 NATO 0.8 Middle East 1.9

Experience 2 : Correspondance analysis - Factor 1-2

N.B. We have eliminated the units “Americas,” “Europe” and “European Union”

## This version of bslib is designed to work with shiny version 1.5.0.9007 or higher.

Experience 2 : Factors 3-4

Experience 2 : Cluster analysis(world regions)

Experience 2 : Cluster analysis (medias)

Bibliography

Brennetot, Arnaud, and Muriel Rosemberg. 2013. “Géographie de lEurope et géographie de la construction européenne.” LEspace Politique, no. 19 (April). https://doi.org/10.4000/espacepolitique.2613.
Cholley, André. 1939. “Régions naturelles et régions humaines.” L’information géographique 4 (2): 40–42. https://doi.org/10.3406/ingeo.1939.5013.
Copeaux, Étienne. 1997. “Chapitre III. Les Instances de Production Du Discours Historique Scolaire.” In, 103–16. CNRS Éditions. https://doi.org/10.4000/books.editionscnrs.35363.
Grataloup, Christian. 2011. “La fausse neutralité des continents.” Revue internationale et stratégique 82 (2): 97. https://doi.org/10.3917/ris.082.0097.
Lewis, Martin W., and Kären Wigen. 2019. The Myth of Continents. University of California Press. https://doi.org/10.1525/9780520918597.